Dynamic learning of individual-level suicidal ideation trajectories to enhance mental health care

There has recently been an increase in ongoing patient-report routine outcome monitoring for individuals within clinical care, which has corresponded to increased longitudinal information about an individual. However, many models that are aimed at clinical practice have difficulty fully incorporating this information. This is in part due to the difficulty in dealing with the irregularly time-spaced observations that are common in clinical data. Consequently, we built individual-level continuous-time trajectory models of suicidal ideation for a clinical population (N = 585) with data collected via a digital platform. We demonstrate how such models predict an individual’s level and variability of future suicide ideation, with implications for the frequency that individuals may need to be observed. These individual-level predictions provide a more personalised understanding than other predictive methods and have implications for enhanced measurement-based care.

the event of a "True" response, the participant is required to then interpret the perceived distress of that on a Likert scale (0-3, with 3 being severe distress).A cut-off score of 6 or more on the symptom-scale has been shown to detect at-risk mental states with 87% specificity and sensitivity 7 .
Mania-like experiences ASRM -The Altman Self-Rating Mania Scale (ASRM) is a 5-item scale used to assess the presence and/or severity of manic-like symptoms during the past 7 days.Each item is scored on a 5-point scale (0-4), with total scores ranging from 0-20 and higher scores indicating greater severity.A score greater than 5 indicates a high likelihood of a manic or hypomanic condition, as per 86% sensitivity and 87% sensitivity 9 .

Functioning
Youth not in education or employment (NEET) -Engagement with either employment, education, or training was based on questions from the Organisation of Economic Cooperation and Development (OECD), and Census of Population and Housing, Australian Bureau of Statistics (ABS) 10 .These four multiple choice questions were: 1) Are you currently engaged in education or study (e.g.school, TAFE or university) on a regular basis?; 2) Are you currently engaged in paid employment or work on a regular basis?; 3) Are you currently engaged in voluntary work through an organisation or group on a regular basis?; and 4) Are you currently providing unpaid care, help or assistance to family members or others because of a disability, a long term illness or problems related to old age on a regular basis?Individuals not involved in employment, education, or training were classified as NEET.
WSAS -The Work and Social Adjustment Scale (WSAS) is a brief and reliable measure of work and social adjustment.The questions aim to assess whether an individual is currently impaired or unable to perform day-to-day tasks, due to their mental health.The scale consists of five items that require the individual to rate between 0-8 ("not at all" to "very severely"), based on their agreement with the statement.A maximum total score is achieved by summing all 5 items.A total score greater than 20 suggests moderately severe psychopathology, 10-20 suggests significant functional impairment with less severe symptomatology, and scores under 10 are associated with subclinical populations.The scale has a test-retest correlation of 0.73 and has correlations of 0.76 for severity of depression and 0.61 for obsessivecompulsive disorder symptoms 11 .

SOFAS -
The Social and Occupational Assessment Scale is a single question that asks an individual to rate their social and occupational functioning 12 .The question on the Innowell platform asks, "Thinking about your ability to participate in everyday social, school (including university/college) or work activities, how much of your "usual activities have been limited?".The individual is then allowed to select 7 options that range from "Only everyday problems or concerns (e.g., mild anxiety before an exam or an occasional argument with family members)" (1) to "Inability to participate in almost all areas of life (e.g., stay in bed all day, or no job, home or friends or think about harming yourself)" (7).

Alcohol, tobacco, cannabis use
ASSIST -The Alcohol, Smoking and Substance Involvement Screening Test (ASSIST) is an 8item questionnaire that aims to detect substance use-related problems.It screens for use of tobacco, alcohol, cannabis, cocaine, amphetamine-type stimulants, sedatives and sleeping pills, hallucinogens, inhalants, opioids, and other drugs.Besides the first question which concerns life-time experiences, each question requires the individual to respond to questions that concern the prior 3-months on either 5-or 3-point Likert scales 13 .
AUDIT-C -The Alcohol Use Disorders Identification Test (AUDIT-C) is a brief measure consisting of three questions related to the frequency of general and binge alcohol consumption over the past year 14 .Each question has 5 options that increase in frequency or quantity of consumption (scores range from 0-4) 15 .A total score of 0-3 indicates low-risk drinking, 4-5 moderate risk, and scores greater than five indicates high risk drinking, however, this may not apply if total points come from q1 (i.e., when q2 and q3 =0).

Social connection
SSSS -The Schuster's Social Support Scale (SSSS) is a 6-item questionnaire that aims to assess the frequency of both supportive and negative interactions with family and friends 16 .The first five questions are scored on a 4-point Likert scale (0-3; "never" to "often") with alpha reliability ranging from 0.56-0.75, and the sixth question requires a Yes/No response 16 .

QIDS-SR -
The Quick Inventory of Depressive Symptomatology -Self-report (QIDS-SR) consists of 16-items that aim to assess nine domains of depression during the preceding seven days 17 .These domains are related to the DSM-IV diagnosis of a major depressive disorder, including: sleep, mood, appetite/weight, concentration/decision making, self-view, suicidal ideation, general interest, energy level, and agitation.Each item is scored on a scale between 0-3 points and scoring instructions determine the total score (which ranges from 0-27) 18  Anxiety OASIS -The Overall Anxiety Severity and Impairment Scale aims to assess the severity and impairment of anxiety-related symptoms over the past seven days 19 .The scale consists of five multiple choice questions with five options that are scored from 0-4, with higher scores indicating greater severity and/or impairment.A cut-off score of 8 has high validity (87%) for identifying anxiety disorders 20 .

Physical health
Height, weight, and waist circumference Body mass index (BMI) is calculated by Weight/(Height 2 ) to estimate total body fat in proportion to total body weight.It is used to estimate risk of cardiovascular, metabolic, and other diseases.Waist circumference, however, is a more accurate estimate of visceral fat and more predictive of cardiovascular diseases.For women, a waist circumference of 80-87cm is considered increased risk, and 88+cm is greatly increased.For men, a waist circumference of 94-101cm is considered increased risk, and 102+cm is considered greatly increased risk 21 .
IPAQ -The International Physical Activity Questionnaire (IPAQ) aims to determine the average physical activity of an individual and can be scored on a continuous and/or categorial scale (high, moderate, or low) 22 .First, all activity is calculated in minutes.Second, minutes should be converted to metabolic equivalent of task (MET) minutes (multiply the minutes by the relevant scalar; walking =3.3, moderate activity=4, vigorous activity =8).Third, multiply MET minutes by number of days the activity was performed.For categorical scoring, high = over 3000 MET minutes a week OR over 1500 MET minutes per week with 3 or more days of vigorous exercise; moderate = at least 3 days of vigorous activity or walking of 30 or more mins per day, OR 5+ days of moderate intensity activity and/or walking (minimum 30 mins per day), OR 5+ days of walking, moderate intensity, or vigorous activity equating to 600+ MET minutes; and low = not meeting high or moderate.

Sleep-wake cycle PSQI -
The Pittsburgh Sleep Quality Index (PSQI) is a questionnaire that aims to assess sleep quality and disturbances over the past month 23 .It consists of nineteen-items that assess seven domains: sleep quality, sleep latency, sleep duration, habitual sleep efficiency, sleep disturbances, use of sleeping medication, and daytime dysfunction.Each of the seven component scores are summed together to form a global score 23 .
MCTQ -The Munich Chronotype Questionnaire (MCTQ) is a self-report scale that assesses bed-and rise-times, and self-assessment of individual chronotype 24 .The chronotype options range from extremely early to extremely late and is determined by using the midpoint between onset and offset of sleep.
Post-traumatic stress PC-PTSD-5 -The Primary Care PTSD Screen (PC-PTSD) is a five-item questionnaire that aims to assess PTSD symptoms which reflect the DSM-V diagnostic criteria 25 .Each item is scored as either Yes or No (1 or 0), and a maximum total score is 5.A cut-off of 3 is optimally sensitive (reduces false negatives; κ[1]) = 0.93, standard error = 0.041), yet a cut-off of 4 is optimally efficient (good balance between false positive and negatives; κ[0.5] = 0.63, standard error = 0.052) 25 .

Eating behaviours and body image EDE -
The Eating Disorder Examination Questionnaire (EDE-Q) is based on the eating disorder examination interview and aims to assess eating behaviours and body image disturbance over the past four weeks 26 .The questionnaire uses a combination of Likert scales and Yes/No options.

Supplementary Methods 2.
The following provides supplementary information with respect to the model parameterisation.

Notation
We use the following notation throughout this supplement: • Random variables are denoted by capital letters with their realisations denoted by lowercase.
• Bold font to represent vectors.
• Underline plus bold face to indicate a set of vectors.
• As short hand we define  : = {  ,  +1 ,  +2 , … ,   } for  > .We also use a similar notation for  where    :  means all realisation in the time interval from   to   inclusive of   and   .• Superscript asterisks are used for proposed parameters.
• Tilde accents are used for predictive variables.

State space models
Consider that we want to model the suicidal ideation trajectories for an individual .Our data is a sum of five questions each on a 10-point scale, leading to a total score that is discretised and bounded to the range [ = 0,  = 50].The observations can also be observed at any future time given by  from a baseline observation at time  = 0.This motivated us to use state space models.State space models consist of two processes; one of which describes the underlying latent state { , ∈ ℝ| ≥ 0} and another that links the latent state to an observed state { , ∈ ℤ| ≤  , ≤ ,  ≥ 0}.

Continuous time models
We assume that the latent states follow a continuous time model which is described by a stochastic differential equation (SDE 27 ).We consider two possible SDEs known as the Wiener and Ornstein-Uhlenbeck processes.Both of these models can be used to simulate realisations for a future latent state  ,+Δ given the current realised state  , , the time interval Δ, a set of parameters   , and a particle that follows  ,, ~ ( ,, ; 0, 1) using a transition function  .( ,+Δ | , ,   ,  ,, ).
The Wiener process is the continuous time equivalent to a random walk process.The discretised realisation of this continuous time process is,  w ( ,+Δ | , ,   ,  , ) =  , +  ,,  ,2 √Δ (1) The second term is a stochastic component that describes how far away the future state value can lie from its current value (also referred to as diffusion throughout the paper).
The Ornstein-Uhlenbeck process is the continuous time equivalent to an autoregressive process.The discretised realisation of this continuous time process is,  ou ( ,+Δ | , ,   ,  , ) =  ,4 + ( , −  ,4 ) − ,3 Δ +  ,,  ,2√ 1 This extends the Wiener process by adding a deterministic component that is a function of the difference between an individual's current state and a long-term constant  ,4 .When  ,3 > 0 this process drives the latent state towards the long-term constant ensuring a stationary process.If  ,3 < 0 then  , =  ,4 is an unstable equilibrium point and any deviation from that value will lead the latent state to be driven to infinity.

Linking procedure
The latent state is then linked to an observed state that is qualitatively consistent with our data by a procedure given by ( , | , , 2 ) where we assume that the observations follow a discrete and bounded normal distribution with a variance of   2 .The probability that an individual has an observation  , =  , is then given by, max(,− ∫ ( , ; ℎ( , ),   2 ) (3)   where (; ℎ( , ),  2 =   2 ) is the normal probability density function calculated at  where the mean value is given by the logit link function, The choice of linking procedure has been pre-supposed.Many linking procedures are possible, and may even be improvements on the current procedure, but have been left out of the scope of this paper.
We also assumed the observational variance value   2 .It is possible to make this a random variable which is also fitted.However, we found it difficult for this to converge to sensible values across multiple chains, where it would either converge to an infinitesimally small or extremely large value.We did explore changing the priors in the hope of fitting this more consistently but found that it was not possible unless we set a highly restrictive prior.As such, we set this to half an integer squared per item, then summed across multiple item questionnaires.The summed score has five questions, and therefore, we assume the variance is   2 = 2.5.We leave   2 out of most of the discussion and notation below for simplification reasons.

Unbounded parameters for fitting purposes
The parameters for an individual are shown above as   .We prefer to parameterise in the unbounded space and then convert to the bounded space, as such we introduce the parameters   .For the Wiener process we then have   = { ,1 , exp( ,2 )} which ensures a positive diffusion rate as  ,2 > 0. For the Ornstein-Uhlenbeck process we use   = { ,1 , exp( ,2 ) , exp( ,3 ) ,  ,4 } which ensures stationarity by enforcing  ,3 > 0. We label the function that performs the mapping   →   as  .(  |  ).

Hierarchical modelling
Now, we want to model the full set of  individuals and share information across the individuals in an hierarchical manner.This requires a probability distribution for the individual parameters that is constrained by the population.As such, we introduce a set of population parameters  = {  ,   | ∀  ∈   } where   are the indices for the set of parameters that are dealt with as random effects.Then an individual-level parameter  , will follow a normal distribution with population level mean   and precision   , such that  , ~ ( , ;   ,  2 =   −1 ).
In a Bayesian context placing priors on the population level distributions as above is required.The normal-gamma prior for the population level parameters was chosen as it is the semi-conjugate prior to the normal individual-level parameter distribution when we have unknown mean and variance.For the baseline parameter, we assume  1 = 1.5 and  1 = 0.2, which allows the population level precision to be in the range (0, 28] with 99% probability, and for the remaining parameters we assume   = 2 and   = 1, which allows for the population-level precision to be within the range (0, 7] with 99% probability, both of which are quite diffuse priors allowing for a wide range of individual-level variation.We allowed the precision uncertainty to be larger for the baseline parameter as those values are well constrained by the initial observation, whereas the remaining parameters are less well known, and thus we enforce slightly more prior information.We then assume that the population mean for the baseline parameters follows a standardised normal distribution, which assumes that the expected baseline observation falls in the range [2, 48] with 99.7% probability.The remaining parameters have their prior for the population mean shifted down by -2.Subtracting by two allows for the log of the population level mean drift and diffusion to be in the range (-5, 1) corresponding to [0.007, 2.718] after exponentiation with 99.7% probability.These ranges were chosen after simulating trajectories with various values for the drift and diffusion.Allowing the drift and diffusion rates to be greater than this leads to nearly instantaneous change, whereas below this allows rates of change that are approximately zero leading to no convergence towards the long-term constant.The population mean for the constant is then allowed to vary between [0.3, 36] in observational space with 99.7% probability, which allows for a potential treatment effect compared to the distribution for the baseline values.We note that we did explore varying these hyperparameters, but didn't find any significant change in the posterior distribution of results when doing so.

Fixed effects
We introduce an extra vector of parameters  = {  |∀  ∈   } where   are the indices for the set of parameters that are dealt with as fixed effects.We set  , =   for all  when the -the parameter is treated as a fixed effect.The prior for the fixed population parameters is set to   ~ (  ; 0,   2 = 5 2 ) for all  treated as fixed effects.
Correlated particle filter methods have been developed to deal with some of the problems that occur in particle filters [33][34][35] .The key to correlated particle filters is first to identify that the discretised realisation of the SDEs described previously are deterministic given a vector of particles  , .Sampling schemes using correlated particle filters leverage this property by incorporating them as random variables that become part of the sampling procedure.Then the particles follow a proposal distribution with initial values drawn from, ( , ) = ( , ; 0,    ) (12)   and subsequent draws from the proposal distribution given by, where    is the identity matrix of size equal to the number of particles   in the vectors  , and  , * .The value  is the degree of correlation between  , and  , * which we set to 0.999.Correlating the particles across iterations acts to reduce the variance in the difference in the log-likelihood which appears in the Metropolis-Hastings algorithm, thus allowing the sampling scheme to be more efficient.
Given the full set of random effects  = {  } =1  , particles  = { ,0:  } =1  where   is the time for each individual's final observation, and data  = { ,0:  } =1  , the posterior distribution that we would like to sample from is given by,  (, , , |).Following Wiqvist et al. 34 we split this up into the following blocks; Steps 1 and 2 are implemented using a Correlated Pseudo-Marginal Metropolis-Hastings (CPMMH) step.
Step 3 is implemented using a Gibbs sampler.Further motivation for the block formulations are discussed by Wiqvist et al. 34 and similarly by Botha, Kohn, & Drovandi 35 who explore other procedures.
Code for the sampling scheme is implemented in the Julia language (v1.8).The code began as a fork from the repository used by Wiqvist et al. 34 .The main changes to the code include; 1) allowing for a different number of particles per individual, 2) compartmentalising the model parameterisation from the sampling scheme, such that we can use the same sampling code across multiple models, and 3) minor refactorization to improve readability and understanding of the code to the authors.

Number of particles
Choosing the number of particles for any particle filter method is a fundamental problem 36 .Too many particles can significantly slow the code.Whereas too few particles leads to increased variance and bias of the log-likelihood estimator, which can lead to inefficient or inaccurate sampling of the posterior distribution, as discussed previously.
The number of particles required typically increases with the number of observations and complexity in the trajectories.In our case, the number of observations differs per individual.
Trajectories are also qualitatively different with different degrees of fluctuations that require a differing number of particles.As such, the optimum number of particles is different per individual.We broadly follow the recommendations by several authors 34,36 to estimate the number of particles required per individual.
We start by running four 'short' chains of 10000 iterations with  = 1000 particles for all individuals and use the maximum a posteriori (MAP) as a single point estimate.The parameters of this point estimate are then used to estimate the variance and correlation of the total log-likelihood (|, , ) = ∏ (   =1 |  , ,   ,   2 ).This is done by; 1) sampling the particles in accordance with their initial proposal ( , ) for all  and  and estimating the log-likelihood (|, , ), 2) sampling a subsequent set of particles in accordance with the proposal ( , * | , ) for all  and  then estimating the log-likelihood (|, ,  * ), 3) repeating steps 1-3 100 times to create vectors of estimated log-likelihood values, and 4) calculating the correlation between (|, , ) and (|, ,  * ) that we denote as   .
After this step, we increase the number of particles such that the variance of the total loglikelihood is below a threshold.The typical threshold for the total log-likelihood is usually .

Initial state
A good initial guess helps sampling algorithms work more reliably and efficiently.As such, we estimate  ,1 for each individual  using the logistic transformation of their initial value.For values lying on the boundaries (e.g.,  = ,  as defined previously) we perturb the value slightly so that it is within the boundaries such that a logistic function can be applied.Similarly, the initial value for  ,3 for each individual was taken as the mean of their observations after performing a logistic transformation in a similar method to above.Constructing good parameter guesses for the other variables is difficult and thus we used a trial-and-error process until we had values that were reasonably close to the final results we were obtaining across multiple runs.

Testing of the sampling scheme
The sampling scheme was tested by recovering parameters for simulated data.The simulated data was designed to follow similar patterns to the observed suicidal ideation trajectories within Innowell.We outline the testing procedure and results for the random effects Wiener process model that was used as our final model, but similar testing was completed for the other models.
First, we assume that there are  sim = 3000 individuals.The number of observations for each individual was assumed to follow a negative binomial distribution with   sim ∼ ( = 0.7778,  = 0.1375), where  and  are the maximum likelihood estimated values using the suicidal ideation data.We then take all individuals with greater than 10 observations ( sim = 49).The time interval between observations is assumed to be constant with Δ = 7 days.The true population parameters were set to;  sim = {{−2.5,1.0},{−1.0,1.0}}.The parameters for each individual were drawn from the population distributions conditional on the population parameters.We then simulated trajectories in accordance with the trajectory dynamics and measurement processes defined above.
The sampling scheme was ran four times.Each run was performed for 5 × 10 4 iterations with 10% of chains removed as burn-in.We examined the convergence and resolution of these samples 39 .We used the Gelman-Rubin potential scale reduction factor ( ̂.) and it's multivariate equivalent ( ̂. p ) as a quantitative measure of convergence 40,41  We then checked that the true values were consistent with the estimated posterior distribution.For the population parameters we found that all true values were within the 95% ETI.For the individual parameters we calculated the number of times that the true values were within the 50% ETI.Consistent with expectation we found that 50.1% (SE, 1.9%) of parameters met this criterion.Thus, the estimated posterior distribution has support over the true values.

Diagnostic tests of the sampling scheme for the Innowell data
The reported results for the Innowell data come from four separate sampling chains that were run for 10 5 iterations.Every iteration was saved for the log-likelihood and population parameters.As the number of individual parameters is large, we limited memory usage by only saving every 10 th iteration for .We removed the first 10% of the samples as a burn-in.
We presupposed the percentage samples to remove as a burn-in but checked that this was greater than the Raftery-Lewis criteria 42 .All estimated values are calculated after combining the remaining samples from each chain.
We examined the convergence and resolution of these samples using the same techniques as in the previous section 39 .We found reasonable convergence and resolution for the total loglikelihood ( ̂, ).We also qualitatively checked the trace plots for these values by eye.We perform similar checks for other quantities of interest that are reported throughout the paper, including for the predicted trajectories shown in Figure 2 and  3.
We also performed posterior predictive checks 43,44 to further understand the model specification.We checked whether a follow-up observation at the next time lies within the 95% ETI of the posterior predictive distribution given all prior observations for each observation where  > 0. We found that 96.1% (SD, 0.7%) met this criterion, which is consistent with the expected 95% value.